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Introduction 

There  is  a  significant  heritable  component  of  prostate  cancer.  Increased  familial 
relative  risk  is  observed  across  multiple  populations.  Male  first  degree  relatives  of 
prostate  cancer  patients  have  a  two-  to  three-fold  increased  risk.  Segregation  analyses 
support  genetic  rather  than  shared  environmental  risk.  Twin  cancer  concordance  studies 
reveal  a  higher  heritable  risk  for  prostate  cancer  than  for  any  other  common  cancer. 
Additional  epidemiological  studies  have  been  consistent  with  X-linked  transmission, 
identifying  higher  risk  for  a  man  with  an  affected  brother  relative  to  one  with  an  affected 
father.  Despite  the  overwhelming  genetic  predisposition  evidence,  the  identification  of 
prostate  cancer  susceptibility  genes  has  been  difficult.  Linkage  studies  have  resulted  in 
the  identification  of  several  loci  difficult  to  confirm  across  study  populations.  However, 
summary  studies  of  genome-wide  scans  for  prostate  cancer  susceptibility  loci  in  general 
confirm  two  loci,  HPC-1  and  HPC-X. 

Our  study  seeks  to  identify  a  candidate  gene  or  genes  conferring  prostate  cancer 
susceptibility  at  locus  HPC-X  in  a  US  Caucasian  study  population.  We  hypothesize  that  a 
gene  or  genes  at  HPC-X  harbor  common  moderate -penetrance  variants  predisposing  to 
prostate  cancer.  We  looked  at  shared  haplotypes  in  founder  populations  and  found  two 
intervals  likely  to  harbor  prostate  cancer  susceptibility  genes.  We  have  chosen  to  first 
focus  on  one  interval  at  locus  HPC-X  (termed  HPC-X  region  A)  due  to  shared  haplotype 
association  evidence  in  the  founder  populations  of  Finland,  Iceland  and  Ashkenazim. 

Body 

Accomplishments 

Results  from  Tasks  1-3  have  been  published  in  the  journal  Human  Genetics  (Hum 
Genet.  2008  May;123(4):3  79-86).  A  summary  by  task  appears  below  and  the 
manuscript  is  attached.  Please  note  that  Tables  referenced  are  from  the  attached 
manuscript. 

Task  1  .To  identify  and  genotype  all  common  polymorphism  in  our  study  population  at 
potential  genes  of  the  candidate  interval  (HPC-X  Region  A).  (Months  1-12): 

a.  Perform  de  novo  SNP  discovery  at  predicted  or  known  genes  and  derive  a 
set  of  survey  SNPs  spanning  the  HPC-X  locus  and  a  density  of  3-5  kb 
from  dbSNP 

b.  Genotype  a  subset  of  the  study  population  for  all  SNPs  in  la 

c.  Analyze  genotypes  to  determine  genetic  architecture  of  HPC-X 

Task  la-lc  has  been  completed. 

To  derive  a  set  of  survey  SNPs  spanning  the  interval  at  HPC-X  we  assayed  SNPs 
found  in  dbSNP  for  polymorphism  in  a  subset  of  40  prostate  cancer  cases  from  the 
training  dataset.  Out  of  415  SNPs  culled  from  database  entries,  we  identified  194  as 
polymorphic  and  assayable  in  our  study  population.  To  augment  this  set  of  SNPs,  we 
undertook  de  novo  SNP  discovery  in  the  same  subset  of  40  prostate  cancer  cases  at 
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known  and  predicted  genes  in  the  region,  identified  in  Figure  1.  In  addition  to  known 
genes  SPANXC  and  LDOC1,  custom  software  identified  a  coding  region  containing 
homology  to  RPL44  and  a  pseudogene  containing  homology  to  RBMX2.  De  novo  SNP 
discovery  at  these  four  features  resulted  in  52  additional  SNPs  for  a  total  of  246  SNPs. 
Linkage  disequilibrium  (LD)  patterns  at  HPC-X  Region  A  for  these  246  SNPs  typed  in  a 
subset  of  141  controls  of  our  training  population  are  presented  in  Figure  1.  Four  major 
blocks  of  LD  are  apparent.  Block  “A”  contains  all  four  known  and  predicted  genes. 


Figure  1:  Linkage  disequilibrium  (LD)  patterns  at  HPC-X  Region  A  for  128 
tagging  SNPs  for  141  controls.  SNPs  encompassing  genes  and  10  kb  to  each 
flank  are  positioned  at  top  of  the  figure.  Four  major  blocks  are  observed  and 
marked  as  A,  B,  C  and  D.  Red,  D’  =  1  (LOD  >  2);  blue,  D’=  1  (LOD  <  2); 
pink,  D’  <  1  (LOD  >2);  white,  D’  <  1  (LOD  <  2). 


Task  2.  To  determine  a  set  of  tagging  SNPs  across  our  candidate  interval  at  HPC-X,  to 
genotype  them  in  the  training  dataset  and  to  test  for  significant  association  with  risk  of 
prostate  cancer 

a.  Determine  a  set  of  tagging  SNPs  across  our  candidate  interval  at  HPC-X 
from  the  246  SNPs  typed  in  Task  1 

b.  Genotype  tagging  SNPs  in  the  remainder  of  the  training  dataset  population 

c.  Using  single  allele  and  sliding  window  haplotype  analysis,  determine 
haplotype  windows  statistically  associated  with  risk  of  prostate  cancer 

Task  2  a-c  has  been  completed. 
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Using  LDSelect,  we  have  determined  a  set  of  128  tagging  SNPs  (r  >  0.9)  and 
typed  them  in  the  remainder  of  our  training  dataset  (N=  292  cases,  292  age-matched 
controls).  All  windows  of  statistical  significance  conferring  risk  of  prostate  cancer 
(nominal  P  <  0.05)  within  the  candidate  interval  were  from  four  distinct  regions  and  are 
seen  in  Table  2  of  the  attached  manuscript.  Regions  are  listed  from  5’  to  3’  across  the 
candidate  interval.  Individual  SNPs  in  each  haplotype  are  identified  with  an  internal 
seven-digit  code.  Case  and  control  frequencies  are  seen  for  the  haplotype  window  with 
the  most  significant  P  value,  colored  in  black.  All  other  haplotype  windows  showing 
statistical  significance  are  colored  in  grey. 


Task  3.  To  confirm  associated  prostate  cancer  gene  variants  in  a  second  study  population, 
and  to  extend  investigation  in  an  African  American  Study  population.  (Months  24-36) 

a.  Ascertainment  of  independent  population  to  be  used  as  a  test  dataset 

b.  Confirm  or  refute  areas  of  statistical  association  from  Task  2  in  test 
dataset 

c.  Extend  findings  into  an  African  American  study  population,  currently 
under  ascertainment 

Task  3  a,  b  has  been  completed.  Task  3c  is  in  progress. 

We  have  recently  ascertained  an  independent  test  dataset  of  215  prostate  cancer 
probands  with  a  family  history  of  disease  and  215  age-matched  controls.  We  used  this 
dataset  to  confirm  or  refute  statistical  associations  seen  in  the  training  dataset.  We 
identified  haplotype  tagging  SNPs  (htSNP)  for  each  of  the  four  candidate  risk  haplotypes 
encompassing  the  entirety  of  each  associated  region  and  tested  for  association  with  risk 
of  prostate  cancer  in  our  test  dataset  (Table  3  in  the  attached  manuscript).  Case  and 
control  frequencies  differ  from  those  reported  in  Table  2  due  to  the  use  of  only  htSNPs  to 
define  the  haplotype,  allowing  inclusion  of  some  samples  previously  dropped  from 
analysis  due  to  missing  data.  Haplotype  3  was  statistically  significant  within  the  test 
population  (6.6%  of  cases,  2.5%  of  controls;  P  =  0.04).  This  haplotype  spans  areas  “A” 
and  “B”  as  seen  in  Figure  1,  identifying  a  recombination  hotspot.  Using  permutation 
testing,  this  result  was  still  significant  after  adjustment  for  multiple  testing  (P  =  0.048). 

We  currently  do  not  have  sufficient  power  in  our  African  American  dataset  and  as 
a  result,  we  are  actively  increasing  our  ascertainment  efforts. 


3 


Key  Research  Accomplishments 

1 .  Ascertainment  of  a  US  Caucasian  study  population  with  statistical  power  to  detect 
common  variants  that  may  predispose  prostate  cancer  risk. 

2.  De  novo  SNP  discovery  leading  to  discovery  of  52  unpublished  SNPs  (at  time  of 
discovery)  in  a  US  Caucasian  population.  At  time  of  writing  32  are  still  unpublished  and 
have  been  submitted  to  dbSNP. 

3.  Identification  of  a  22.8  kb  area  spanning  a  recombination  hotspot  tagged  by  6  htSNPs 
associated  with  risk  of  prostate  cancer  in  both  test  and  training  datasets. 

Reportable  Outcomes 

A  Haplotype  at  Xq27.2  Confers  Susceptibility  to  Prostate  Cancer 

Hum  Genet.  2008  May;123(4):379-86;  PMID  18350320 

Yaspan  BL,  McReynolds  KM,  Elmore  JB,  Breyer  JP,  Bradley  KM,  Smith  JR 

Investigation  of  a  candidate  locus  at  HPC-X  in  familial  prostate  cancer 
Poster  presentation  at  IMPaCT  meeting,  Hyatt  Regency  Atlanta  2007 
Yaspan  B,  McReynolds  K,  Elmore  JB,  Breyer  J,  Bradley  K,  Smith  JR 

No  association  with  risk  of  prostate  cancer  for  LDOC1  and  SPANX-C  candidate  genes  within  the 
HPC-X  locus  in  a  US  Caucasian  study  population 

Poster  presentation  at  the  American  Society  of  Human  Genetics  Meeting,  New  Orleans,  LA  2006 
Yaspan  B,  Elmore  JB,  Breyer  J,  Bradley  K,  McReynolds  K,  Smith  JR 

Conclusions 

Over  the  past  two  years,  we  have  been  systematically  dissecting  HPC-X  to  uncover  the 
variant  or  variants  responsible  for  its  association  with  risk  of  prostate  cancer.  Starting  with 
region  A,  whose  boundaries  we  identified  through  shared  haplotype  analysis  of  founder 
populations,  we  have  identified  one  haplotype  tagged  by  6  htSNP  spanning  a  22.8  kb  region 
pinpointing  a  recombination  hotspot  associated  with  risk  of  prostate  cancer.  This  haplotype  was 
statistically  significant  for  risk  of  prostate  cancer  in  both  our  test  and  training  datasets. 
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Abstract  We  conducted  an  association  study  to  identify 
risk  variants  for  familial  prostate  cancer  within  the  HPCX 
locus  at  Xq27  among  Americans  of  Northern  European 
descent.  We  investigated  a  total  of  507  familial  prostate 
cancer  probands  and  507  age-matched  controls  without  a 
personal  or  family  history  of  prostate  cancer.  The  study 
population  was  subdivided  into  a  set  of  training  subjects  to 
explore  genetic  variation  of  the  locus  potentially  impacting 
risk  of  prostate  cancer,  and  an  independent  set  of  test  sub¬ 
jects  to  confirm  the  association  and  to  assign  significance, 
addressing  multiple  comparisons.  We  identified  a  22.9  kb 
haplotype  nominally  associated  with  prostate  cancer  among 
training  subjects  (292  cases,  292  controls;  y2  =  5.08, 
P  =  0.020),  that  was  confirmed  among  test  subjects  (215 
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cases,  215  controls;  y2  =  3.73,  P  =  0.040).  The  haplotype 
predisposed  to  prostate  cancer  with  an  odds  ratio  of  3.41 
(95%  Cl  1.04-11.17,  P  =  0.034)  among  test  subjects.  The 
haplotype  extending  from  rs5907859  to  rsl493189  is  con¬ 
cordant  with  a  prior  study  of  the  region  within  the  Finnish 
founder  population,  and  warrants  further  independent 
investigation. 

Background 

Linkage  and  genetic  epidemiological  data  support  the  exis¬ 
tence  of  genetic  variants  on  the  X  chromosome  that  predis¬ 
pose  to  prostate  cancer  (Woolf  1960).  Prostate  cancer  loci 
on  both  arms  of  the  X  chromosome  have  been  identified, 
including  the  HPCX  locus  at  Xq27-28  (Bochum  et  al. 
2002;  Brown  et  al.  2004;  Chang  et  al.  2005;  Cunningham 
et  al.  2003;  Farnham  et  al.  2005;  Gillanders  et  al.  2004; 
Gudmundsson  et  al.  2008;  Lange  et  al.  1999;  Schleutker 
et  al.  2000;  Xu  et  al.  1998).  The  ~14  Mb  linkage  interval  of 
HPCX  was  originally  delineated  within  US,  Swedish,  and 
Finnish  hereditary  prostate  cancer  pedigrees  (Xu  et  al. 
1998).  Further  shared  haplotype  analysis  among  Finnish 
probands  refined  the  locus  to  a  candidate  interval  flanked 
on  either  side  by  a  notable  113  kb  inverted  repeat  (Baffoe- 
Bonnie  et  al.  2005a,  b).  The  352  kb  area  between  these 
inverted  repeats  was  the  candidate  interval  for  the  present 
study  (Fig.  1),  which  sought  evidence  of  association  with 
familial  prostate  cancer  among  Americans  of  Northern 
European  descent.  Our  study  population  was  uniquely  com¬ 
prised  of  independent  familial  prostate  cancer  probands, 
matched  to  controls  with  no  personal  or  family  history  of 
prostate  cancer.  These  two  groups  represent  extremes  of 
potential  genetic  load  for  prostate  cancer.  Our  study 
included  a  training  set  of  292  case-control  pairs  to  identify 


5)  Springer 


380 


Hum  Genet  (2008)  123:379-386 


chrX  (q27.1-q27.3)  !».«■ 


ESOt 


1S9C05024 

OMOS424  MCCCMJ4  140505*24  141006*24 

141505*24 

142009905 

CTOe 

OXS1232 

OXS884  D3S2390  bG«2i1  1  0XS6751 

DXS1205  MMOHjO 

DG82I1  9 

DXS67M 

DXS8106 

SOX3  CDR1  SPANXB  LDOC1  SPANXC  SPANXA2  SPANXO  MA6EC3  MAGEC2 

SPA  NX  A 1  - .  UAGECt 

SPANXN4 

HPCX  Sptrfte  Repetitive  Redons 

Fig.  1  HPCX  Candidate  Interval.  3  MB  of  Xq27. 1-27.3  depicting 
previously  genotyped  STR  markers,  annotated  genes,  and  a  complex 
HPCX  specific  repeat  is  shown  at  top  ( red  bounding  box ,  NCBI  build 
36.1  ChrX:  139005624-142009903  bp).  The  candidate  interval  for 
study  is  zoomed  at  bottom  ( blue  bounding  box ,  ChrX:  140046709- 
140391709  bp).  The  interval  contains  SPANXC  and  LDOC1,  as  well  as 
a  predicted  RPL44  homolog  and  a  pseudogene  of  RBMX2.  As  members 


nominal  associations,  and  a  test  set  of  215  case-control 
pairs  to  confirm  or  to  refute  observations  within  the  training 
set.  We  conducted  extensive  allele  discovery  and  validation 
within  the  study  population,  characterized  linkage  disequi¬ 
librium  (LD)  patterns,  and  selected  tagging  SNPs  for  tests 
of  association  by  haplotype-based  methods.  Our  investiga¬ 
tion  comprehensively  tested  association  of  the  candidate 
interval  with  prostate  cancer,  and  included  non-unique 
genomic  regions  that  are  not  amenable  to  current  high- 
throughput  techniques. 


of  larger  gene  families,  unique  long-range  amplimers  ( denoted )  were 
required  to  ensure  site-specific  assays.  Polymorphic  SNPs  (N  =  246) 
are  positioned  on  the  map  (tagging  SNPs  in  pink).  At  bottom  is  a  pair¬ 
wise  LD  matrix  for  141  controls  across  the  subset  of  220  SNPs  with 
MAF  >  0.05.  Red  D'  =  1  (lod  >2);  blue  D'  =  1  (lod  <2);  pink  D'  <  1 
(lod  >  2);  white  D'  <  1  (lod  <2).  Blocks  of  LD  are  denotedA,  B.  C  and 
D.  Block  A  contains  all  four  candidate  gene  regions 


Materials  and  methods 

Samples 

Study  subjects  were  Americans  of  Northern  European 
descent,  ascertained  with  informed  consent  between  2002 
and  2007  from  Vanderbilt  University  Medical  Center  and 
from  the  VA  Tennessee  Valley  Healthcare  System  (adja¬ 
cent  hospitals)  with  institutional  review  board  oversight. 
Subjects  were  residents  of  Tennessee  (75%),  Kentucky 
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(15%),  Georgia  (2%),  Alabama  (1%),  Mississippi  (1%), 
Virginia  (1%),  and  other  states  (4%).  Familial  prostate  can¬ 
cer  cases  were  ascertained  at  the  time  of  treatment  for  the 
principal  diagnosis  of  prostate  cancer,  and  controls  were 
ascertained  at  the  time  of  routine  preventative  screening  for 
prostate  cancer.  All  prostate  cancer  probands  included  in 
the  study  are  from  pedigrees  with  a  family  history  of  pros¬ 
tate  cancer,  and  all  control  probands  are  from  pedigrees 
without  a  family  history  of  prostate  cancer.  Family  history 
included  first  and  second  degree  relatives.  Controls  had  a 
screening  prostate  specific  antigen  (PSA)  test  <4  ng/ml  at 
the  time  of  ascertainment,  and  had  no  record  of  a  PSA  test 
>4  ng/ml  or  record  of  abnormal  digital  rectal  examination. 
Controls  were  matched  to  cases  on  age  (±2.5  years;  age  at 
screen  for  controls,  age  at  diagnosis  for  cases).  Case  and 
control  pedigrees  were  of  comparable  size.  The  mean  num¬ 
ber  of  at-risk  male  siblings  was  1.7  for  cases,  and  1.8  for 
controls.  Initial  accruals  included  292  unrelated,  indepen¬ 
dent  familial  prostate  cancer  probands  and  292  age- 
matched  controls,  comprising  a  training  study  group.  Sub¬ 
sequent  accruals  included  215  additional  unrelated,  inde¬ 
pendent  prostate  cancer  probands  and  215  additional  age- 
matched  controls,  comprising  a  separate  test  study  group. 
The  geographical  distribution  (by  state  of  residence)  of 
training  and  test  subjects  was  not  significantly  different. 
Analyses  preferentially  employed  prostatectomy  specimen 
over  biopsy  Gleason  score  (available  for  87%  of  cases). 
Table  1  provides  characteristics  of  the  study  population. 

Genotyping  methods 

DNA  was  extracted  from  whole  blood  using  the  Puregene 
DNA  Purification  System  Standard  Protocol  (Qiagen, 
Valencia,  CA).  DNA  was  quantified  using  the  PicoGreen 
dsDNA  Quantitation  Kit  (Invitrogen,  Carlsbad,  CA), 
imaged  with  a  Molecular  Devices/LJL  Analyst  HT 
(Molecular  Devices,  Union  City,  CA).  We  genotyped  SNPs 
by  single  nucleotide  primer  extension  and  fluorescence 


polarization,  as  previously  described  (Yaspan  et  al.  2007). 
Both  forward  and  reverse  strand  extension  primers  were 
tested  to  select  the  most  robust  assay.  Amplimer  and  exten¬ 
sion  primer  sequences  for  tagging  SNPs  are  provided  in 
Electronic  supplementary  Table  1. 

Candidate  polymorphisms 

To  capture  genetic  diversity  across  the  candidate  interval, 
database  SNPs  annotated  in  dbSNP  were  screened  for  com¬ 
mon  polymorphism  in  a  set  of  40  independent  familial 
prostate  cancer  probands  from  the  training  study  group. 
This  included  415  annotated  SNPs  on  chromosome  X 
between  positions  140,036,557  and  140,388,361  (NCBI 
Build  36.1).  The  screening  set  was  estimated  to  provide 
98%  power  to  detect  a  polymorphism  with  a  minor  variant 
frequency  of  0.10,  and  87%  power  with  a  frequency  of 
0.05.  These  40  prostate  cancer  cases  were  also  used  for  de 
novo  SNP  discovery  at  known  and  predicted  genes  within 
the  candidate  interval:  4.6  kb  5'  to  0.2  kb  3'  of  LDOC1; 
1.6  kb  5'  to  4.4  kb  3'  of  SPANXC;  3.0  kb  5'  to  1.4  kb  3'  of 
a  predicted  coding  region  containing  homology  to  ribo- 
somal  protein  L44  (“ hRPL44 ”);  and  2.3  kb  5'  to  0.8  kb  3' 
of  a  predicted  pseudogene  containing  homology  to  RBMX2 
(“ RBMX2P1 ”).  The  latter  two  annotations  were  identified 
by  aligning  transcript  evidence  on  the  genomic  map.  We 
employed  two  single-stranded  conformation  polymorphism 
methods  (redundant)  and  re-sequencing  for  SNP  discovery, 
as  previously  described  (Yaspan  et  al.  2007).  Exons  of  the 
four  genes  were  also  re-sequenced  for  all  40  prostate  cancer 
cases  in  the  screening  set. 

Nested  amplification  of  non-unique  regions 

Non-unique  regions  of  SPANXC,  hRPL44  and  RBMX2P1 
were  assayed  using  a  nested  reaction  strategy.  Unique  prim¬ 
ing  sites  were  identified  flanking  non-unique  regions  and 
amplified  using  the  Expand  Long  Template  PCR  System 


Table  1  Study  population  char¬ 
acteristics 


a  At  diagnosis  for  cases,  at 
screen  for  controls 
b  Proband  plus  first  and  second 
degree  affected  relatives 


Training 

Test 

Combined 

Controls 

Cases 

Controls 

Cases 

Controls 

Cases 

No. 

292 

292 

215 

215 

507 

507 

Mean  age3  (years) 

63.4 

61.3 

60.7 

60.6 

62.3 

61.0 

Median  PSAa 

0.95 

5.7 

0.92 

5.6 

0.92 

5.7 

Median  Gleason  sum 

- 

6 

- 

6 

- 

6 

Gleason  sum  <  6,  No. 

- 

145 

- 

114 

- 

259 

Gleason  sum  >  7,  No. 

- 

130 

- 

96 

- 

226 

Affected  in  pedigree,  No.b 

0 

292 

- 

215 

- 

507 

- 

2 

- 

184 

- 

142 

- 

326 

>3 

- 

108 

- 

73 

- 

181 
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(Roche  Diagnostics,  Indianapolis,  IN).  Amplimer  primers 
for  long  range  PCR  are  provided  in  Electronic  supplemen¬ 
tary  Table  1.  Long-range  PCR  product  was  then  diluted 
1:5,000,000  to  dilute  carry-over  genomic  template  to  non- 
amplifiable  levels  while  retaining  the  ability  to  amplify 
from  the  long-range  PCR  product.  This  was  verified  by  suc¬ 
cessful  test  of  amplimers  nested  within  the  template  long- 
range  PCR  product,  but  failure  of  amplimers  elsewhere  in 
the  genome.  Nested  amplimers  were  designed  in  overlap¬ 
ping  fashion  along  a  given  long  range  amplimer,  enabling 
PCR-based  assays.  These  primers  are  provided  in  Elec¬ 
tronic  supplementary  Table  2.  High  pairwise  LD  between 
SNPs  within  a  given  non-unique  region  and  flanking  unique 
SNPs  supported  correct  non-unique  copy  assay  (visible  in 
Fig.l).  All  nested  assays  within  long-range  amplimers 
yielded  one  allele  per  subject,  concordant  with  a  unique 
X-chromosomal  region  for  a  male. 

Tag  SNP  determination 

Tagging  SNP  determination  was  conducted  in  a  subset  of 
141  training  set  control  subjects  that  were  genotyped  at  246 
SNPs  (including  194  validated  from  dbSNP  and  52  identi¬ 
fied  by  de  novo  discovery  efforts).  Pairwise  LD  was  visua¬ 
lized  using  Haploview  v4  (Barrett  et  al.  2005).  LDSelect 
was  used  for  tagging  SNP  selection,  specifying  a  MAF 
threshold  of  0.05  and  an  r2  threshold  of  0.9  (Carlson  et  al. 
2004).  A  total  of  128  tagging  SNPs  were  selected  for  assay 
in  the  remaining  subjects  of  the  training  set  (totaling  292 
independent  familial  case  probands  and  their  292  age- 
matched  controls).  Data  were  obtained  for  96.2%  of  the 
74,752  tagging  genotypes  sought  in  the  training  subjects, 
with  a  per  marker  range  from  88.4  to  100%.  SNPs  in  this 
tagging  set  and  their  assay  primers  are  listed  in  Electronic 
supplementary  Table  1. 

Statistical  analyses 

A  sliding  window  approach  tested  a  haplotype  window  of  N 
markers,  sliding  the  window  along  the  map  in  single 
marker  increments  (Fallin  et  al.  2001;  Mathias  et  al.  2006; 
Yaspan  et  al.  2007).  Each  /V-marker  haplotype  was  com¬ 
pared  to  the  remaining  haplotypes  as  a  group  among  the 
training  group  of  292  cases  and  292  matched  controls.  The 
resulting  2x2  contingency  table  was  evaluated  by  the  '/} 
test  statistic.  Haplotype  windows  of  1-10  markers  were 
evaluated  in  the  exploratory  analyses  of  training  subjects. 
In  a  given  map  region  that  was  nominally  associated  with 
prostate  cancer  within  the  training  subjects  (P  <  0.05),  haplo¬ 
type  tagging  SNPs  (htSNPs)  were  selected  that  most 
efficiently  distinguished  the  risk  haplotype  from  others  in 
the  region.  Nominally  significant  tagged  haplotypes  (two 
observed)  were  genotyped  in  a  subsequently  ascertained 


independent  test  group  of  215  cases  and  their  215  matched 
controls  to  address  multiple  comparisons.  Significance  for  a 
given  tagged  haplotype  candidate  was  adjusted  for  the  two 
comparisons  among  test  subjects  through  permutation  test¬ 
ing.  We  generated  5,000  copies  of  the  data  set  in  which 
pseudo  case  status  was  permuted  among  cases  and  controls 
of  the  test  group.  A  xp  value  for  each  tagged  haplotype  was 
calculated  for  each  simulated  data  set,  as  it  was  for  the  real 
data.  Since  the  null  hypothesis  is  true  for  each  randomized 
subject  set,  the  proportion  of  simulated  '/}  values  greater 
than  the  real  x2  value  was  used  as  a  P  value  for  the  associa¬ 
tion,  adjusted  for  multiple  comparisons.  Unless  specifically 
noted,  P  values  are  unadjusted  for  multiple  comparisons. 

A  risk  haplotype  that  was  significant  after  adjustment  for 
multiple  comparisons  among  test  subjects  was  subse¬ 
quently  modeled  by  conditional  logistic  regression  to 
obtain  an  estimate  of  effect  size  (Intercooled  Stata  9,  Stata 
Corporation,  College  Station,  TX).  The  regression  model 
was  adjusted  for  the  matching  variable:  age  (age  at  diagno¬ 
sis  for  cases,  age  at  negative  screen  for  controls).  Permuta¬ 
tion  testing  was  employed  to  assign  significance. 

Results  and  discussion 

Our  allele  discovery  and  characterization  was  done  within  a 
screening  set  of  40  familial  prostate  cancer  probands  from 
the  training  group.  We  evaluated  415  SNPs  annotated  in 
dbSNP,  194  of  which  were  polymorphic  in  these  subjects. 
We  also  undertook  de  novo  SNP  discovery  efforts  across 
LDOC1  and  SPANXC,  as  well  as  across  an  RPL44  homolog 
and  RBMX2P1  pseudogene.  Among  these,  only  LDOC1 
resides  within  a  region  of  unique  genomic  sequence.  We 
devised  a  nested  amplification  system  to  allow  assay  of 
non-unique  genomic  sequence  flanked  by  unique  sequence. 
Collectively,  we  discovered  52  common  SNPs  amenable  to 
assay.  The  246  polymorphic  SNPs  (194  from  dbSNP,  52 
newly  discovered)  of  the  screening  subjects  were  geno¬ 
typed  in  a  subset  of  the  training  study  population  (141  cases 
and  141  controls)  for  assessment  of  allele  frequency  and  for 
tagging  SNP  selection  based  upon  LD  patterns.  Within  this 
data,  220  SNPs  had  a  minor  allele  frequency  >0.05  for 
inclusion  in  subsequent  analyses.  Pairwise  LD  across  the 
candidate  interval  for  these  SNPs  highlights  four  LD  blocks 
(denoted  A,  B,  C,  and  D  in  Fig.  1). 

Among  the  220  informative  SNPs,  we  selected  128  tag¬ 
ging  SNPs  for  genotyping  in  the  full  group  of  training  sub¬ 
jects  (292  familial  case  probands  and  292  age-matched 
controls).  We  explored  evidence  of  association  with  pros¬ 
tate  cancer  using  a  haplotype-based  sliding  window 
approach.  This  entailed  evaluation  of  1,235  haplotype  win¬ 
dows  across  the  candidate  interval.  All  haplotype  windows 
of  statistical  significance  were  from  four  distinct  regions. 
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At  each  of  the  four  regions,  multiple  overlapping  windows 
were  consistent  with  the  redundant  identification  of  one 
haplotype  associated  with  prostate  cancer  risk.  These  four 
candidate  risk  haplotypes  are  numbered  \^\  in  Table  2. 

Only  a  subset  of  SNPs  in  each  of  the  four  regions  was 
required  to  distinguish  the  candidate  risk  haplotype  from 
remaining  haplotypes.  We  identified  haplotype-tagging 
SNPs  (htSNPs)  efficiently  capturing  the  four  candidate 


risk  haplotypes  (full  span  of  windows  P  <  0.05)  of  Table  2. 
As  our  analysis  required  complete  data  for  each  subject 
across  the  multiple  SNPs  of  the  haplotype,  the  restricted  set 
of  htSNPs  provided  a  better  estimate  of  haplotype  fre¬ 
quency.  Only  two  of  the  four  haplotypes  were  nominally 
significant  when  assessed  by  htSNPs  among  training 
subjects  (Table  3,  haplotype  1  {jp  =  5.24,  P  =  0.023)  and 
haplotype  3  (y2  =  5.08,  P  =  0.020)).  We  evaluated  evidence 


Table  2  Sliding  window  risk  haplotypes  at  Xq27 — training  subjects 


Location 


Haplotype  1 
(hRPL44) 


Haplotype  2 
(RBMX2P1) 


Haplotype  3 
(ChrX: 
140190766- 
140213636) 


Haplotype  4 
(ChrX: 
140266943- 
140295222) 


Significant  Haplotype  Windows3  Allele 


rsl  1095852 

rs5907823 

rs7880499 

rsl  01 6824 

rsl  21 56848 

rs7885649 

rs5953563 

rs714076 

rs845150 

rs5907844 

rs881223 

rs881221 

rs881222 

rs881219 

rs2864937 

rs5907848 

rs2201245 

rs5907851 

rs5907859 

rsl  3891 94 

rs861508 

rs845163 

rs845164 

rs845165 

rs845190 

rs845188 

rs845187 

rs845186 

rs5907874 

rs845182 

rsl  4931 89 

rs844971 

rs5954277 

rs844964 

rs844963 

rs844961 

rs844957 

rs844956 

rs844953 

rs6636273 

rs844952 

rs844946 

rsl  4931 92 

rs926809 


'I 


X 

C_ 

G_ 

T_ 

X 

X 

_A 

G_ 

X 

G_ 

_A 

x 

X 

X 

X 

_A 

X 

X 

X 

T_ 

x 

_A 

C_ 

X 

x 

c_ 

X 

x 

x 

x 

x 

x 

x 

x 

x 

x 

x 

x 

x 

x 

x 

x 

x 

c 


Cases  Controls  p » 


92  (39.8)  63  (27.3)  0.003 


15  (8.6)  5  (2.9)  0.021 


15(6.9)  3(1.4)  0.003 


21  (8.9)  9  (3.8)  0.024 


a  Sliding  haplotype  windows  of  P  <  0.05,  graphically  ordered  as  most  (black)  to  least  significant  (left  to  right) 
b  P  for  haplotype  designated  in  black,  with  corresponding  numbers  of  cases  and  controls,  and  haplotype  frequencies  (%) 
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Table  3  Tagged  risk  haplotypes  at  Xq27  — training  and  test  subjects 


Location 

htSNP 

Allele- 

Training 

Test 

Cases  Controls 

P 

Cases  Controls 

P 

rs5907823 

c 

Haplotype  1 

rs7880499 

G 

rs1016824 

T 

95 

71 

73 

79 

rsl  21 56848 

G 

(38.6) 

(28.9) 

(39.3) 

(42.5) 

rs7885649 

A 

Haplotype  2 
(RBMX2P1) 

rs845150 

A 

19 

(6.6) 

9 

(3.1) 

0.062 

rs861508 

c 

Haplotype  3 

rs845165 

A 

(ChrX: 

rs845190 

T 

18 

7 

13 

5 

140190766- 

rs845187 

c 

(6.8) 

(2.6) 

(6.6) 

(2.5) 

140213636)  rs845186 

A 

rsl  4931 89 

A 

Haplotype  4 
(ChrX: 
140266943- 

rs844963 

A 

22 

14 

rs844956 

c 

(7.7) 

(4.9) 

0.168 

140295222) 

Numbers  of  subjects  and  tagged  haplotype  frequencies  (%)  are  indi¬ 
cated 


of  association  between  these  two  htSNP  haplotypes  and 
prostate  cancer  in  a  second,  independent  study  group  of  215 
familial  prostate  cancer  probands  and  215  age-matched 
controls.  These  subjects  were  accrued  after  the  training  sub¬ 
jects  over  the  course  of  the  ongoing  study.  Numerous 
exploratory  tests  were  conducted  among  training  subjects, 
but  only  two  tests  were  conducted  among  test  subjects,  a 
greatly  restricted  number  of  comparisons.  Only  haplotype  3 
was  significant  among  test  subjects  (Table  3,  x2  =  3.73, 
P  =  0.040).  Permutation  testing  was  used  to  correct  this 
value  for  the  two  comparisons  conducted  in  test  subjects, 
yielding  P  =  0.048.  Our  study  identifies  haplotype  3  as  the 
most  likely  genetic  variant  of  the  interval  to  be  associated 
with  familial  prostate  cancer,  with  a  nominal  significance  of 
P  =  0.003  in  the  combined  training  and  test  subjects. 

Under  logistic  regression  modeling  to  assess  effect  size, 
haplotype  3  was  associated  with  prostate  cancer  with  an 
odds  ratio  of  3.41  (95%  Cl  1.04-11.17,  P  =  0.034)  among 
test  subjects,  and  an  odds  ratio  of  2.52  (95%  Cl  1.25-5.10, 
P  =  0.006)  among  combined  training  and  test  subjects.  The 
number  of  cases  with  aggressive  prostate  cancer  among  the 
test  subjects  was  too  few  to  provide  a  significant  estimate  of 
effect  size.  However,  the  226  cases  with  aggressive  prostate 
cancer  (Gleason  score  >7)  among  the  full  study  population 
yielded  a  more  marked  odds  ratio  of  4.06  (95%  Cl  1.15— 
14.31,  P  =  0.021).  Gleason  score  is  among  the  most  impor¬ 
tant  criteria  in  defining  clinically  significant  disease.  Our 
results  are  consistent  with  linkage  data  at  the  locus  under 
stratification  for  clinically  significant  disease  (Chang  et  al. 
2005). 

The  location  of  the  haplotype  found  to  be  significantly 
associated  with  prostate  cancer  in  this  study  coincides  with 
that  described  through  prior  high-density  simple  tandem 
repeat  mapping  within  a  Finnish  study  population  (Baffoe- 
Bonnie  et  al.  2005a,  b).  Among  the  simple  tandem  repeats, 


bG82il.l  was  most  significantly  associated  with  prostate 
cancer  in  the  prior  study.  The  peak  associated  haplotype 
was  comprised  of  alleles  at  bG82il.l  (centromeric)  and 
bG82il.O  (telomeric),  P  =  0.0014.  Haplotype  3  of  our  study 
directly  overlays  the  recombination  hotspot  between  LD 
blocks  A  and  B  of  Fig.  1.  The  most  centromeric  SNP  of  the 
associated  haplotype  (rs5907859)  is  4.0  kb  downstream 
from  bG82il.l.  The  most  telomeric  SNP  of  the  associated 
haplotype  (rsl493189)  is  1.9  kb  downstream  from 
bG82il.O.  The  same  genomic  region  of  interest  is  high¬ 
lighted  by  our  present  study  of  Americans  of  Northern 
European  descent  and  the  prior  study  of  Finns.  We  further 
note  that  among  SNPs  evaluated  in  the  genome  wide  asso¬ 
ciation  study  of  prostate  cancer  recently  published  by  Tho¬ 
mas  et  al.  rs845189  has  a  Whole  Genome  Rank  of  1,135 
out  of  527,869  SNPs  assessed,  with  a  significance  of 
P  =  0.002  (Thomas  et  al.  2008).  This  SNP  resides  at  the  LD 
break  centered  within  the  disease-associated  haplotype  of 
our  study. 

The  associated  haplotype  region  does  not  harbor  known 
genes.  All  missense  variants  of  potential  interest  in  the 
entire  candidate  interval  of  352  kb  were  within  SPANXC, 
30  kb  from  the  associated  haplotype.  These  missense  vari¬ 
ants  clustered  into  two  LD  groups.  The  first  group  (all  in 
exon  1)  included  D17E,  A21V,  and  M24T.  The  second 
group  (all  in  exon  2)  included  P29S,  T30S,  D32Y,  and 
M42L.  Within  a  group,  a  male  subject  had  either  each  first 
or  each  second  allele  as  listed.  Additional  SPANXC  mis¬ 
sense  variants,  E23K,  V59F  and  L68V,  did  not  appear  to  be 
in  these  two  LD  groups.  This  allele  structure  in  SPANXC  is 
also  evident  in  data  of  an  independent  study  (Kouprina  et  al. 
2007).  That  study  also  found  no  evidence  to  support  an  asso¬ 
ciation  between  SPANXC  alleles  and  risk  of  prostate  cancer. 
The  coding  regions  of  hRPLA4  and  LDOC1  were  without 
missense  variants.  We  denote  RBMX2P1  as  a  pseudogene, 
having  a  mutated  initiator  methionine,  multiple  frameshift 
mutations,  and  an  internal  Alu  insertion.  Thus,  the  missense 
variants  at  SPANXC  were  among  the  best  potential  candi¬ 
dates  for  association  with  prostate  cancer  at  Xq27. 

The  haplotype  significantly  associated  with  prostate  can¬ 
cer  in  this  study  straddles  an  LD  break,  potentially  detect¬ 
ing  a  pair  of  contributing  components  located  within  each 
of  the  two  bounding  LD  blocks  (e.g.  a  gene  and  a  long- 
range  regulatory  element).  In  a  sliding  window  haplotype 
analysis,  a  haplotype  overlapping  the  two  blocks  would  be 
particularly  suited  to  detect  such  a  combination.  We  consid¬ 
ered  the  possibility  that  causal  variants  are  a  pair  of 
non-contiguous  SNPs  within  each  LD  block.  We  divided 
haplotype  3  so  that  those  SNPs  in  LD  block  A  comprised 
sub-haplotype  3A,  and  those  in  block  B  comprised  sub- 
haplotype  3B.  Eighteen  of  the  SNPs  within  block  A  and 
only  one  SNP  within  block  B  had  an  r2  >  0.8  with  the 
respective  sub-haplotypes.  A  matrix  depiction  of  pairwise 
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r  values  between  these  is  illustrated  in  Electronic  supple¬ 
mentary  Fig.  1.  The  T30S  (ss78456788)  variant  of 
SPANXC  (in  LD  Block  A)  also  demonstrated  modest  LD 
with  sub-haplotype  3A  (r2  =  0.73).  Another  variant  altering 
an  open  reading  frame  within  RBMX2P1  (rs  1968987) 
directly  marked  sub-haplotype  3A  (r2  =  1). 

Only  a  subset  of  the  SNPs  demonstrating  LD  with  the 
two  sub-haplotypes  had  been  genotyped  as  tagging  SNPs  in 
the  training  study  group  to  enable  an  assessment  of  disease 
association.  These  included  rs  1012777,  ss78456788 

(T30S),  rsl2394263,  ss78456800,  rs5953578,  rs845144, 
rs7 14076,  rs881223,  and  ss78456818  in  LD  block  A,  and 
rs5907874  in  LD  block  B.  With  only  one  exception,  y2  tests 
of  association  for  these  block  A  SNP-block  B  SNP  pair 
haplotypes  were  each  associated  with  prostate  cancer  in  our 
training  group  (P  range  0.0008-0.030).  These  SNP  pairs 
and  the  original  sliding  window  haplotype  spanning  the  LD 
break  each  detect  the  association  with  prostate  cancer  with 
varying  efficiencies. 

Our  study  sought  to  identify  the  genetic  variant  predis¬ 
posing  to  familial  prostate  cancer  at  Xq27,  a  locus  initially 
identified  by  linkage  study  of  American,  Swedish,  and 
Finnish  hereditary  prostate  cancer  pedigrees,  and  subse¬ 
quently  refined  by  linkage  disequilibrium  analysis  of  the 
Finnish  familial  prostate  cancer  cases.  After  a  comprehen¬ 
sive  effort  in  the  present  study,  we  identified  a  single  candi¬ 
date  haplotype  that  was  associated  with  familial  prostate 
cancer  within  independent  training  and  test  study  subjects. 
Although  the  replication  was  encouraging,  the  sample  size 
of  our  test  group  was  sufficiently  small  that  an  independent 
assessment  of  significance  is  warranted.  Population  struc¬ 
ture  is  unlikely  to  represent  a  confounding  factor  within  our 
study,  as  self-described  ethnicity  has  recently  been  shown 
to  accurately  represent  genetic  ancestry  among  Americans 
of  Northern  European  descent  (Hunter  et  al.  2007;  Tang 
et  al.  2005).  We  believe  that  this  haplotype  represents  the 
best  candidate  within  the  region  for  further  investigation 
within  additional  study  populations.  If  confirmed,  these 
findings  begin  to  clarify  the  X-linked  heritable  component 
of  prostate  cancer  risk. 
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